Investigate - European Soccer Database

Table of Contents

Introduction

Dataset Description

We will use European Soccer Database from Kaggle cover 11 European Leagues from 2008:2016 sessions to analsis the following table displays the dataset in details

Tables names NO.of rows NO.of columns
Country 11 2
League 11 3
Player 11.1k 7
Player_Attributes 184k 42
Match 26,000 115
Team 299 5
Team_Attributes 1458 25

Questions for Analysis

Sir kindly notice that I installed

plotly-5.5.0 tenacity-8.0.1

locally to run two charts below

Data Wrangling

First, let's load all tables from database.sqlite and get familiar with it

stats about numeric attributes

perfect normal distrubioation

as we notce that mean and median in both coloumns are close this normal distribution for height and maximum weight is weight 110.223 kg let's explore the heaviest plyer/s and the tallest/s

note the weight in pounds

too meny null but we intrest just in mo salah requred

all numeric columns in Player Attributes df is a float which is the appropriate data type

we ignored buildUpPlayDribbling will droped it later

of course, the 2 charts show strong right-skewed which more values are concentrated on the lift side of the distribution graph while the ri9ght tail of the distribution graph is longer in our case and for any football ((soccer)) fan it's normal that almost matches end with a few goals or even without a goal at all min =0

let's drop all columns which 40% of them had a nulls value


First question

Data Cleaning

merge player and player_att to creat a new df contains the player names and attributes

select mohamed salah records by mask

let's chek mo' salah df for nulls values

Great no null values or dupliate values found

Exploratory Data Analysis for first question

The mo's average is clearly much bigger than the main average

from the recoreds we can find tremendous improvements in mo's performance max-min the important attributes which had a strong impact on his overall rating are :- finishing , heading_accuracy , volleys , ball_control , acceleration ,sprint_speed , agility , reactions , balance

as we notice that all atributs's

median is > mean

which indecate to left SKEWED negative

let's explore a few single attributes alone to be sure from our conclusion

As we know left-skewed type of distribution in which more values are concentrated on the right (tail) side of the distribution graph while the left tail of the distribution graph is longer in our case indicate to very good improvement mo" Salah performance

almost over 90 % which reflect one of main mo's character as aplyer

parsing the date column from string to date

then use it to sort mo's records ascendingly

defenltly we witnce an increase in all atributes

from his journey start as a professional we notice continuous incremental in his attributes but in corrugated way lets see every year

First for the hist part :- the increase in mo's performance in 2013 then down in 2014 tyhen incres agian strongly

It is very important to note that 2016 only shows 2 matches in the first months, so we will ignore 2016 due to lack of overall score

From box plot , we notice that the third part of the overall performance ( 2q:3q )took only 10 months from September 2014 to July 2015, let's check it out in more detail

Finishing attacks. The chart , clearly shows that Mohamed Salah’s ability to end the attacks has developed when he was loaned to Italian Fiorentina in the period between January and May 2015, so his abilities to end the attacks during the Chelsea period were (68%), but it progressed in Fiorentina until it reached (76%). ) on (April 24, 2015), which is his last period with the Italian team.

Dealing with volleys. Air games in particular witnessed the most prominent development in Salah’s abilities during his loan to Fiorentina. When he arrived at Chelsea, his capabilities were limited to a maximum potential that reached (48%), but during his time with the Fiorentina team, he reached (68%) , which reflects a development Notable in Salah's handling of aerial games.

Ball_control. Although Salah reached high numbers in the ability to control the eight-year period represented by the statistic, during the year he spent at Chelsea, his ability to control the ball decreased to (81%), but it returned to rise during his loan period to Fiorentina to reach in the last His period with the Italian team (April 24, 2015) reached (85%). reactions.

Salah's reaction with Fiorentina improved than it was before, so his numbers regarding this were at the end of his period at Chelsea (69%), but at the end of his period with Fiorentina (April 24, 2015) to reach (74%).

Balance. Mohamed Salah's balancing abilities improved quickly with Fiorentina and within a short period, when he arrived at the Italian club, his numbers were at their best regarding that (80%), he succeeded in reaching (84%) and he did not succeed for another


Second question

To investigate the second question we will use theTeam_Attributes_df

merge player and player_att to creat a new df contains the player names and attributes

select Spanish teams team_df

check for null values for both DF

# we already droped buildUpPlayDribbling col by drop_forty_null_column

Great no null or duplicated values

inner join between Spanish teams names and attributes

let's have a look in Spanish teams attributes we just selected from the team_df

we will select the numeric attributes to explor team"s attributes

by check mean min max looks like everything normal

Well, let's explore the offensive attributes of the teams

from the three charts above it's clear that this attribute very strong attributes

Well, let's explore the defensive attributes of the teams

it's clear that the grades for defense are less than offensive ones which represent the offensive style of the Spanish league

let's find out the top strong Spanish teams

grouping by team name all his numeric attributes and sum them

total of all summation of attributes sorted descendingly

top ten Spanish teams

Something is wrong here where FC Barcelona 3-time champion from 2008 to 2016

The result is not even near to the fact that FC Barcelona is the champion of three-time in the exact period so let's modify our range to focus on the attributes relate to any strong team the offensive part and the strength of the formation attacks and shooting :-

* chanceCreationShooting

* chanceCreationPassing

* buildUpPlaySpeed

* buildUpPlayPassing

total of all summation of limet list of attributes sorted descendingly

No trace of Barcelona in the top ten agian !!!

so let's extract the mean and median for the 4 atrributes

buildUpPlaySpeed mean = 268 < median = 276
buildUpPlayPassing mean = 260 < median = 261
chanceCreationPassing mean = 301 < median = 309
chanceCreationShooting mean = 311 < median = 319

from above we have left skew which in general the Spanish teams have strong grades in these attributes

# total of all summation of selected as representative attributes

Third Question


we already dropped 17 col cause they had 40% null values from Match DataFrame

select the columns we interested to explore from Match df

great no null or duplicated values

creat sepreat DataFrame copy for home points

change name of team api_id to use it in mereg

calculate the point/s for home team per match

select the columns relate to home to exploer

merge "inner join" for home Results

Let's calculate the home and away goals for each season

let's calulate the home golas for each team from2008:2016

we notice Atlético Madrid in third place but far away from big 2

The list of top 10 in Spanish league from 2008:20016 according home point

from describe() we notice ateam have only 9 points at home in 8 sessions

it's hard to believe that the last team point for 8 sessions in his home only 9 points!!!! Córdoba CF 9

we notice Atlético Madrid in third place agian !

creat sepreat DataFrame copy for away points

creat sepreat copy for away name

change name of team api_id to use it in mereg

calculate the point/s for away team per match

select the columns relate to away to exploer

merge "inner join" for away Results

calculate the total points fro all teams in the period from 2008:2016

top ten teams according to points away

top ten teams according to goals away

and Atlético Madrid in third place agian !

definitely something desiver to investigate in depith hear . we notice Atlético Madrid in third place agian ! as pattern but have alook to stats

wow, we notice a team has only 5 points for 8 S away

we wonder who is it?

CD Numancia 5

lets investigate the top three Spanish teams

summiton of the home points grouping by season then name for each team

select only the 3 teams results from away goal & point

sum of the away points per season then name of each team

concat the two df hom and away for top three

remarkable journey from the 9th in 2009 to the champion in 2013 Atlético Madrid was the champion la Liga 2013/2014 Atlético Madrid 90 FC Barcelona 87 Real Madrid CF 87 Atlético repeated his achievement again and won the championship but after 2016 which out analysis scope

From the bottom bar of each team representing 2008 to the upper one, we see an increase in Atlético points compared to Real Madrid and Barcelona

Conclusions

While tracing Mohamed Salah's career in different clubs he played for, I witnessed a good rise in his performance in the period between 2012-2014. Then I found a decrease in his effort and then a greater rise within a short period, specifically the first months of 2015. Salah’s achievement and results changed so much during these periods, I found that the first period of rising was the period of his first professional career in Europe with the Swiss FC Basel, and it lasted for a year and a half, while the period of the greatest rise in his career during the years of research was the period of the first months of The year 2015, specifically between February and May 2015, was a period during which he loaned the English club, Chelsea, to Fiorentina.

the Spanish league main characteristics is an offensive style in chanceCreationShooting chanceCreationPassing buildUpPlaySpeed * buildUpPlayPassing as normal real Madrid shows a great grade attributes but the dataset failed to represent the same situation with the champion Barcelona

in investagte the third Question we face arepeted batterrn that It is clear to us that the claims that lack of competition in La Liga and constrained it only between the two big teams Real Madrid and Barcelona are inaccurate and that Atlético de Madrid has won the league title once and with a very good performance since 2008: 2016 from the ninth. to the top three. Many times and It's a strange result to find out that CD Numancia is 5 points away and Córdoba CF only has 9 points!!!! In all eight seasons**

Limitations

for curiosity i investigated the English league using the same table team attributes the results are not consistent again with real results. top teams in the bottom !! below is a print screen of the final result

england.png

acknowledges to

www.geeksforgeeks.org www.stackoverflow.com www.analyticsvidhya.com